Similarity Pyramid: Browsing a Document Database with

نویسندگان

  • ildus Ahmadullin
  • Jan Allebach
  • Ildus Ahmadullin
چکیده

 Similarity Pyramid: Browsing a Document Database with ildus Ahmadullin, Jan Allebach HP Laboratories HPL-2012-21 Document database management; document visual similarity; similarity pyramid; Isomap Managing large document databases has become an important task. Sorting documents with respect to their visual similarity and layout features, and visualization of the whole document database is a desirable application. A user may wish to search for documents in a database that are similar to a query in their stylistic features, or he/she may want to browse the whole database. In these tasks clustering similar documents and organizing the document database with respect to the clusters is preferable to presenting documents in a random order. In this paper, we propose organization of single-page documents in a 3-D hierarchical structure called a similarity pyramid. The pyramid is constructed from a stack of document database embeddings on a 2-D surface with the help of a nonlinear dimensionality reduction algorithm called Isomap. The mapping algorithm preserves similarity dis-tances between documents by mapping documents that are close to each other in a feature space to points on low-dimensional surface that are close to each other. Higher levels of the pyramid consist of document image icons that represent a large group of roughly similar documents, whereas lower levels contain document image icons representing small groups of very similar documents. A user can browse the database by moving along a certain level of a pyramid by moving between different levels. External Posting Date: February 6, 2012 [Fulltext] Approved for External Publication Internal Posting Date: February 6, 2012 [Fulltext] Copyright 2012 Hewlett-Packard Development Company, L.P. Similarity Pyramid: Browsing a Document Database with Respect to Visual Similarity Ildus Ahmadullin, Jan Allebach School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907-1285, U.S.A

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ViBE: A New Paradigm for Video Database Browsing and Search

Video Browsing Environment (ViBE ) is a unique browseable/searchable paradigm for organizing video databases containing a large number of sequences. The system first segments video sequences into shots by using the Generalized Trace obtained from the DC-sequence of the compressed data stream. Each video shot is then represented by a hierarchical tree structure of key frames, and the shots are a...

متن کامل

IMAGE DATABASE MANAGEMENT USING SIMILARITY PYRAMIDS A Thesis

Chen, Jau-Yuen, Ph.D., Purdue University, May, 1999. Image Database Management using Similarity Pyramids. Major Professor: Charles A. Bouman. In this work, four major components of image database have been examined: image similarity, search-by-query, browsing environments, and user feedback. We first present a formal framework for designing image similarity and search algorithms. This framework...

متن کامل

Active browsing using similarity pyramids

In this paper, we describe a new approach to managing large image databases which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database’s organization to suit the desired task. Our method is based on a similarity pyramid data structure which hierarchically organizes the database so that it can be efficiently b...

متن کامل

Similarity pyramids for browsing and organization of large image databases

The advent of large image databases (>10,000) has created a need for tools which can search and organize images automatically by their content. This paper presents a method for designing a hierarchical browsing environment which we call a similarity pyramid. The similarity pyramid groups similar images together while allowing users to view the database at varying levels of resolution. We show t...

متن کامل

Hierarchical browsing and search of large image databases

The advent of large image databases (>10000) has created a need for tools which can search and organize images automatically by their content. This paper focuses on the use of hierarchical tree-structures to both speed-up search-by-query and organize databases for effective browsing. The first part of this paper develops a fast search algorithm based on best-first branch and bound search. This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012